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RNA co-transcriptional folding has long been suspected 
to play an active role in helping proper native folding of 
ribozymes and structured regulatory motifs in mRNA un- 
translated regions. Yet, the underlying mechanisms and 
coding requirements for efficient co-transcriptional folding 
remain unclear. Traditional approaches have intrinsic 
limitations to dissect RNA folding paths, as they rely on 
sequence mutations or circular permutations that typically 
perturb both RNA folding paths and equilibrium structures. 
Here, we show that exploiting sequence symmetries instead 
of mutations can circumvent this problem by essentially 
decoupling folding paths from equilibrium structures of 
designed RNA sequences. Using bistable RNA switches 
with symmetrical helices conserved under sequence reversal, 
we demonstrate experimentally that native and transiently 
formed helices can guide efficient co-transcriptional folding 
into either long-lived structure of these RNA switches. Their 
folding path is controlled by the order of helix nucleations 
and subsequent exchanges during transcription, and may 
also be redirected by transient antisense interactions. Hence, 
transient intra- and intermolecular base pair interactions can 
effectively regulate the folding of nascent RNA molecules into 
different native structures, provided limited coding require- 
ments, as discussed from an information theory perspective. 
This constitutive coupling between RNA synthesis and RNA 
folding regulation may have enabled the early emergence of 
autonomous RNA-based regulation networks. 

Introduction 

RNA molecules exhibit a wide range of functions from essential 
components of the transcription/translation machinerylin to natu- 
ral or in vitro selected ribozymes(2,[3] or aptamers 0,0,0] and dif- 
ferent classes of gene expression regulators (e.g. miRNA, siRNA, 
riboswitches)f^, 8, 9, 10, 11, 12, 13]. The functional control of 
many of these RNA molecules hinges on the formation of spe- 
cific base pairs in cis or trans and secondary structure rearrange- 
ments between long-lived alternative folds. Yet, because of their 
limited 4-letter alphabet and strong base pair stacking energies, 
RNAs are also prone to adopt long-lived misfolded structures|14{], 
as observed for instance upon heat renaturation. Hence, efficient 
RNA folding paths leading to properly folded structures bear an 
important role in the regulatory function of non-coding RNAs and 
mRNA untranslated regions [14j]. 

It has long been proposed IT^. [T^ [TtIi that, during transcrip- 
tion, the progressive folding of nascent RNAs limits the number 
of folding pathways, presumably facilitating their rapid folding 
into proper native structures. It is not clear, however, whether 
native domains fold sequentially and independently from one 
another or whether co-transcriptional folding paths result from 
more intricate interactions between domains and individual he- 



lices. Transcriptional RNA switches provide interesting examples 
of co-transcriptional folding paths with local competition between 
newly formed and alternative helices. Such natural RNA switches 
are typically found in virus or plasmid genomes fT8l . [T9Ll20ll2lll23l 
and in bacterial mRNA untranslated regions where they regu- 
late gene expression at the level of transcription elongation (e.g. 
through termination/antitermination mechanism') l23l I24L [2^ or 
at the level of translation initiation (e.g. through sequestration 
of Shine-Dalgamo motifs)^ [H H M S Mi- The struc- 
tural changes controlling their regulatory function may correspond 
to a switch in equilibrium structure or in co-transcriptional fold- 
ing path caused by binding an effector (e.g. a protein, a small 
metabolite or an antisense sequence) (33] . Alternatively, RNA 
switches may operate through spontaneous or assisted relaxation 
of an initially metastable co-transcriptional fold(33|]. Hence, RNA 
switches can have stringent needs to control their folding between 
alternative structural folds, which makes them ideal candidates to 
dissect RNA co-transcriptional folding mechanisms and estimate 
the minimal sequence constraints to encode them. 

Several inspiring reports have demonstrated the importance of 
co-transcriptional folding (2l|,[33, 111 [lijsljsi, [11 lig], and fold- 
ing pathways of E. coli RNaseP RNAll4lL 1421 and Tetrahymena 
group I intron(43l l44ll have been probed using circularly permu- 
tated variants of their wild type sequences (see Discussion). Yet, 
dissecting folding paths of natural RNAs remains generally dif- 
ficult due to two fundamental issues: i) sequence mutations or 
circular permutations generally affect both RNA folding paths 
and equilibrium structures (hence preventing independent prob- 
ing of folding paths on their own), and ii) many natural non- 
coding RNAs have likely evolved to perform multiple interdepen- 
dent functions, which are all encoded on their primary sequence 
and thereby all potentially affected, directly or indirectly, by se- 
quence mutations. 

To circumvent these limitations, we propose to use artificial 
RNA switches, presumably void of biological functions, and in- 
vestigate how to efficiently encode their folding paths by exploit- 
ing simple sequence symmetries, instead of extensive (and pos- 
sibly non-conclusive) mutation studies. Beyond specific exam- 
ples of natural or designed RNA sequences, we aim at delineat- 
ing general mechanisms and coding requirements for efficient co- 
transcriptional folding paths. 

In a nutshell, we have designed a pair of synthetic RNA 
switches sharing strong sequence symmetries, so that both 
molecules partition, at equilibrium, into equivalent branched and 
rod-like nested structures with nearly the same free energy. Yet, in 
spite of this structural equivalence between the two RNA switches 
at equilibrium, we demonstrate that their folding path can be en- 
coded to guide the first RNA switch exclusively into the branched 
structure while the other switch adopts instead the rod-like nested 
structure by the end of transcription. This shows that folding paths 
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do not simply result from the sequential formation of native he- 
lices in their order of appearance during transcription (i.e. 'se- 
quential folding', see Discussion). Instead, efficient folding paths 
rely on the relative stability between native and non-native he- 
lices together with their precise positional order along the 5'-3' 
oriented sequence (i.e. 'encoded co-transcriptional folding'). Fur- 
thermore, we show that efficient folding path can be redirected 
through transient antisense interaction during transcription, sug- 
gesting an intrinsic and possibly ancestral coupling between RNA 
synthesis and folding regulation. 

Materials and Methods 

RNA switch design. 

RNA switches with encoded folding paths depicted in 
Figs. 1&2 were designed using Kinefold online server(4^ l46ll 
( |http: //kinefold.CUrie.fri. A starting GG doublet was chosen to en- 
sure efficient T7 transcription. The sequence of the "direct" RNA 
switch {i.e. 5'-ABCD-3", 73 nucleotides, Fig.2) is: 5'-GGAA- 
CCGUCUCCCUCUGCCAAAAGGUAGAGGGAGAUGGAGC- 
AUCUCUCUCUACGAAGCAGAGAGAGACGAAGG-3 ' . The 
"reverse" RNA switch has exactly the opposite sequence (or re- 
versed orientation), i.e. 5'-DCBA-3'. A "reverse" sequence with 
a single mutation Usg/Cag was also studied to unambiguously 
establish the correspondence between branched versus rod-like 
structures and the two migrating bands on nondenaturing poly- 
acrylamide gels. Fig. 3; It is: 5'-GGAAGCAGAGAGAGACGA- 
AGCAUCUCUCUCUACGAGGCagAGAGGGAGAUGGAAAA- 
CCGUCUCCCUCUGCCAAGG-3'. Complementary DNA 
oligonucleotides including T7 promoter and KpnI/StuI/BamHI 
restriction sites at sequence extremities were bought from 
IBA-Naps, Germany. 

Sequence cloning and in vitro transcription. 

Sequences were inserted into pUC19 plasmid (between Kpnl and 
BamHl restriction sites) using enzyme removal kits (Qiagen) 
and cloned into calcium competent E. coli (DH5a strain) fol- 
lowing standard cloning protocoles. Following plasmid extrac- 
tion (Genomed kit), inserts were sequenced and cut at the StuI 
blunt restriction site located at the end of the desired DNA tem- 
plate. Run off transcription was performed in vitro using T7 RNA 
polymerase (New England Biolabs) for upto 4-5 hours at 37°C 
or 25°C. Heat renaturation was performed from 85°C to room 
temperature in about 10 min (starting from 95 °C gave the same 
results). Renatured and co-transcriptional native structures were 
then separated on 12% 19:1 acryl-bisacrylamide nondenaturing 
gels (IX TAE, temperature <10°C) and observed using ethid- 
ium bromide staining (0.1/ig//il); Ethidium bromide slightly re- 
equilibrates the molecule equilibrium partition between branched 
and rodlike structures during heat renaturation but has no measur- 
able effect at room temperature on the strongly biased partition 
between co-transcriptional native structures. Controls using dena- 
turing gels (6%, 19:1 acryl-bisacrylamide, RNA in formamide and 
8M urea, 50° C) showed that >90% of transcripts had the expected 
run off transcription length. Virtually no other bands (i.e. <5% of 
total transcript) were observed on nondenaturing gels either (i.e. 
apart from the single or double bands shown in Figs. 2-4). 

Results 

The results section is organized into two complementary sub- 
sections. The first one is primarily experimental and demonstrates, 
using sequence symmetries, the basis for encoding efficient fold- 
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FIG. 1: Encoded co-transcriptional folding path of a bistable RNA 
switch. A. Bistable generic sequence with hierarchically overlapping he- 
lices. B. Opposite co-transcriptional folding paths for the direct and re- 
verse sequences rely on small asymmetries in heUx length in the direction 
of transcription (i.e. |Pa| > |Pc| direct sequence and |Pc| > |Pb| reverse 
sequence). C. Eitlier branched or rod-like native structures are obtained 
depending on the direction of transcription, although both structures can 
be designed to co-exist at equilibrium. 

ing paths with a pair of 'symmetrically equivalent' RNA switches 
adopting either their branched or rod-like structure by the end of 
transcription. The second subsection is theoretical and discuss, 
from an information content perspective|'47', and beyond se- 
quence specific examples, the coding requirement for such effi- 
cient co-transcriptional folding. 

Co-transcriptional folding of "direct" and "reverse" switches 

We decided to investigate the basic mechanisms and coding re- 
quirements for efficient RNA folding paths with a stringent test 
case. Following the RNA switch design depicted on Fig. 1, we set 
out to encode two (oppositely oriented) folding paths on the same 
RNA sequence. The proposed bistable RNA switch should form 
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FIG. 2: Opposite co-transcriptional folding paths of a pair of RNA 
switches with 'direct' and 'reverse' sequences (i.e. 5'-ABCD-3' vs 
5'-DCBA-3'). Structures ID and IR (resp. 2D and 2R) of the direct 
and reverse switches are energetically equivalent because of helix sym- 
metries; dashed lines indicate mirror symmetry of Pa, Pb, Pc and Pd 
which are therefore conserved under sequence reversal relating direct and 
reverse switches. Despite these strong similarities between D and R 
structures at equihbrium, direct and reverse switches display 'opposite' 
co-transcriptional folding paths (direct switch into struct. ID and reverse 
switch into struct. 2R) guided through a helix encoded persistence (left) or 
exchange (right) during in vitro transcription using T7 RNA polymerase 
(see Materials and Methods). 

a branched structure (1) and a rod-like structure (2) with approxi- 
mately the same free energy at equilibrium, and yet be guided into 
either one of these structures only, depending on the direction of 
svnthesis li^ . Fig. 1. 

In practice, however, 5 '-3' vs 3 '-5' folding paths cannot be 
probed on the same RNA sequence, as there is no RNA polymerase 
known to perform transcription in 'opposite' (3'-to-5') direction. 




FIG. 3: Correspondence between branched versus rod-like structure 
and migrating bands. A single mutation U38/C38 on the reverse se- 
quence, Ru/c (see blue u/c mutation in Fig. 2) unambiguously demon- 
strates the coiTespondence between the stabilized branched structure and 
the lower band on the gel (see text). 

Hence, instead of studying a single RNA sequence, we have actu- 
ally used a pair of RNA switches with exactly opposite sequences 
, i.e. 5'-ABCD-3' and 5'-DCBA-3' (see Materials and Meth- 
ods). It is important to note that, in general, such pairs of RNA 
molecules do not adopt related structures at equilibrium, due to 
the large asymmetry between free energies of stacking base pairs 
with reversed orientation (e.g. 5'-GC/GC-3'~ -3.4 kcal/mol and 
3'-GC/GC-5'=5'-CG/CG-3'~ -2.4 kcal/mol). For this reason, the 
pair of direct (D) and reverse (R) RNA switches, we have designed 
(Fig. 2), forms at equilibrium a branched structure (1) and a rod- 
like structure (2) constructed around symmetric helices, that are ex- 
actly conserved under sequence reversal (dashed lines on Fig. 2 in- 
dicate mirror symmetry of Pa, Pb, Pc and Pd helices). Thus, com- 
paring transcription products of the direct and reverse sequences 
probes the directionality of their folding paths while keeping the 
equilibrium structures of both switches essentially equivalent by 
symmetry; since structures ID and IR (resp. 2D and 2R) of the 
direct and reverse switches are built on the same helices Pa and Pb 
(resp. Pc and Pd), their sole free energy difference concerns the 
small sequence dependent contribution of single stranded regions 
in the branched (resp. rod-like) structure (GNRA tetra-Ioops and 
all other sequence-dependent tabulated loops have been avoided). 

Fig. 2 demonstrates that, in spite of these strong similarities be- 
tween equilibrium structures, the two RNA switches are indeed 
guided towards two distinct native structures upon in vitro tran- 
scription (see Materials and Methods). The correspondence be- 
tween branched versus rod-like structures and migrating bands 
on polyacrylamide gels was unambiguously established using a 
single mutation U38/C38 on the reverse sequence. Fig. 2. This 
mutation stabilizes the branched structure (UG>CG) relative to 
the rod-like structure (AU<AC) at equilibrium, hence demon- 
strating the correspondence between branched structure and lower 
band, Fig. 3. Note, however, that this mutation also perturbs the 
co-transcriptional folding path by redirecting about half of the 
molecules into the branched structure, hence illustrating the dif- 
ficulty to dissect independently folding paths from equilibrium 
structures with sequence mutations only (see Introduction). 

These results strongly support the co-transcriptional folding 
principles depicted on Fig. I which primarily rely on the differ- 
ence in helix length (i.e. |Pa| > |Pc| for the direct switch and 
|Pc| > |Pb| for the reverse switch) to code for the co-transcriptional 
formation of structures ID and 2R, respectively. This small asym- 
metry between successive overlapping helices in the direction of 
transcription induces a divergence in structural cascades between 
the two co-transcriptional folding paths; namely, the red helix Pc 
cannot displace and replace the longer (and stronger) helix Pa 
previously formed by the direct switch during transcription [|Pa- 
Pc|/|Pc| ~I5%], while Pc does displace and replace the shorter 
(and weaker) helix Pb initially formed by the nascent reverse se- 
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FIG. 4; Influence of temperature and transient antisense interactions 
on co-transcriptional folding. Equilibrium and native structures of re- 
verse switch (R) with in vitro T7 transcription at 25°C (left, see text) and 
under in vitro T7 transcription in presence of 0.3 nmol//^l of the 7-nt an- 
tisense DNA oligonucleotide CCTCTAC (right, see text). Structures are 
separated on a 12% 19:1 acryl-bisacrylamide nondenaturing gel (temper- 
ature <10°C) and observed using ethidium bromide staining as on Fig. 2 
(see Materials and Methods). 



quence [|Pc-Pb|/|Pc| ~30%]. The efficacy of these folding paths 
using T7 polymerase appears maximum (~100%) for the direct 
sequence while it is about 90% for the reverse switch at 37° C sug- 
gesting that the branch migration exchange between Pb and Pc is 
not always successful in these conditions (T7 transcription rate is 
about 200-400 nt/s at 37° C); however we found that the folding 
bifurcation is almost always achieved (~100%, Fig. 4) at lower 
temperature 25 °C which decreases 3 to 4 folds T7 transcription 
rate 15011 . This small improvement in bifurcation efficiencies sug- 
gests that the decreasing elongation rate indeed prevails over other 



opposite kinetic factors at lower temperature. In particular, nucle- 
ation of Pc (which probably involves the opening of one or two 
base pairs of Pb, as sketched in Fig. 2) and the subsequent branch 
migration between Pb and Pc are probably both slowed down at 
lower temperature. 

Overall, this demonstrates that the competition between native 
and non-native helices can lead to efficient co-transcriptional fold- 
ing paths of RNA switches independently from their actual equi- 
librium structures. 

Moreover, we found that the folding path of the reverse switch 
could be significantly redirected towards structure IR (~50%) 
through transient antisense interactions, Fig. 4 (right). This is 
simply achieved using a 7-nt-long antisense oligonucleotide de- 
signed to interfere, through competing interactions, with the en- 
coded exchange between helices Pb and Pc, Fig. 5. Note, in par- 
ticular, that the hybridized antisense probe is eventually displaced 
by the longer (and stronger) downstream helix Pa of each nascent 
reverse sequence (no global shift of equilibrium bands is observed 
between transcriptional folds formed in presence or absence of an- 
tisense probe. Fig. 4). This shows that transient antisense inter- 
actions can, in principle, control multiple turnovers of redirected 
folding pathways. 

Hence, transient intra- and intermolecular base pair interactions 
can efficiently regulate the folding of nascent RNA molecules 
between alternative long-lived native structures, irrespective of 
their actual thermodynamic stability. Indeed, once formed, the 
co-transcriptional structures ID and 2R remain trapped out-of- 
equilibrium for more than a day at room temperature (data 
not shown) demonstrating that these RNA switches can reliably 
store information on physiological time scales with their co- 
transcriptionally folded structures. In another context, the ability 
to control folding between distinct long-lived structures of nucleic 
acids using electrical ^4^ or thermal i5lll stimuli, instead of tran- 
scription, could also lead to nanotechnology applications. 

While our conclusions are based on particular examples of syn- 
thetic RNA switches related by sequence reversal and helix sym- 
metries, we want to stress that these strong symmetry constraints 
are solely instrumental in demonstrating the possible indepen- 
dence between encoded folding paths and low free energy RNA 
structures. These symmetries are not directly used nor necessary 
to achieve efficient co-transcriptional folding. On the contrary, 
imposing such strong sequence symmetries greatly limits the ad- 
ditional "information content" that can possibly be encoded on the 
sequence. In the next subsection, we discuss how this use of se- 
quence symmetries can actually be formalized to provide quanti- 
tative estimates on the minimum coding requirement for selective 
folding paths of generic RNA swiches. 

Bounding coding requirement through sequence symmetries 

In this subsection, we discuss how sequence symmetries can 
actually be used to estimate necessary base pairing conditions to 
encode efficient co-transcriptional folding paths. This requires, 
however, to reformulate base pairing conditions from an informa- 
tion content perspective, followin g th e approach developped for 
biomolecular sequences in refs l47l . l48ll . 

In the following, we first establish a simple conservation law 
for information content. We then argue that upperbounds for the 
coding requirement of selective folding paths (or other molecu- 
lar features) can be estimated by restricting the available coding 
space with strong sequence symmetries. Ultimately, upperbounds 
on coding requirements are related to the likelyhood that a partic- 
ular feature might arise from natural or in vitro selection. 
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FIG. 5: Antisense regulation of co-transcriptional folding 
paths. Interpretation of the encoded (left) and redirected (riglit) 
co-transcriptional folding paths of the reverse switch (Fig. 4). This 
is based on simulations performed using the kinefold server(4^ 
( httpi/Zkine fold, curie. fr I; To simulate the effect of antisense 
interaction, the 7mer and RNA switch sequences are actually attached 
together via an inert linker (made of 'X' bases that do not pair). 

Let us first recall what the information content of a biomolecule 
is, before showing how it can actually be estimated for designed 
RNA switches using sequence symmetries. 

The information content J of a functional biomolecule coiTe- 
sponds to the number of sequence constraints that have to be con- 
served to maintain its function under random mutations (47L l48ll . 
Expressed in nucleotide unit, the maximum information content 
that can be encoded on an A^-nucleotide-long RNA sequence is 
precisely /max = N nucleotides, which define a unique sequence 
amongst all different RNA sequences with A'' nucleotides, 
where D is the size of the coding alphabet (D=4 for nucleic acids; 
D=20 for proteins). The fact that neutral mutations can accu- 



mulate on an RNA sequence without altering its function implies 
I < /max = N and can be simply translated into a consen'ation 
law, I + J — N, where J, the sequence entropy, corresponds to 
the number of unconstrained nucleotides which generate Q, = D'^ 
RNA sequences with the same function. Hence J = log£,(f2) and 
I ^ N~J = log^(D~) - logc(S^) 111,!!!]. While J and I can 
be inferred by sampling sequence space as demonstrated in l48h . 
their contributions to do not usually correspond to a simple par- 
tition between J "meaningless" and / "meaningful" bases since 
many sequence constraints actually arise from non-local base-base 
correlations, as shown, in particular, with base pair covariations 
between homologous RNA sequences. For this reason, it is usu- 
ally instructive to quantify information content separately within 
paired and unpaired regions, when considering RNA structures. 
Since about 70% of all bases are usually paired in low energy RNA 
structures, these base pairs typically contribute the most to total 
information content and thereby to the minimum coding require- 
ment for a given RNA function. Hence, in the following, we will 
focus, for simplicity, on short RNA sequences (e.g. A'^ <100 nu- 
cleotides) and consider, at first, only base paired regions, ignoring 
both wobble base pairs and unpaired regions. 

With these crude initial assumptions, the information content of 
a short RNA sequence adopting a unique stable secondary struc- 
ture can be estimated as J ~ J ~ N/2, since the first base of 
each Watson-Crick base pair can be chosen arbitrarily. Hence, 
overall, short RNA sequences adopting a unique stable secondary 
structure present a large sequence entropy J = Ju — N/2 that 
can, in principle, be used to encode additional features such as al- 
ternative, low energy structures [[sl , [sll or possibly other molecu- 
lar properties like co-transcriptional folding pathways, as shown 
in the previous subsection. For example, encoding the simple 
bistable RNA of Fig. lA but with exactly overlapping helices 
(|Pa| = lPb[ = |Pc| = |Pd|) requires that / ~ 3iV/4 nucleotides 
be fixed once the initial J — Jt — N/A bases are chosen arbitrar- 
ily (e.g. in the first pairing region). Including also as stable struc- 
ture the pseudoknot constructed around the same four complemen- 
tary regions (so as to obtain a tri-stable RNA molecule) then im- 
plies that each pairing region is self-complementary and that only 
J ~ N/8 bases can be chosen arbitrarily (e.g. in the first half 
of the first pairing region). Similarly, designing the same bistable 
RNA of Fig. 1 A but with symmetrical helices (conserved under 5'- 
3' sequence inversion) implies that only around J = Jbs — N/8 
bases can be chosen arbitrarily (e.g. in the first half of the first 
pairing region). 

Similar estimates can be made including wobble base pairs (GU 
and UG) in addition to Watson-Crick base pairs (GC, CG, AU 
and UA). In that case, the available sequence entropy becomes 
Ju — log4(6) • N/2 ~ 0.65A^ for a molecule with a unique 
structure (i.e. with 6 possible base pairs), while we get for the 
previous bistable RNA, Jt ~ log4(14) ■ iV/4 ~ 0.48iV (i.e. 
with 14 possible quadruplet "circuits" including circular permu- 



tations: 2xgg, 2x^5(, 2xgg, 4xgg 



and 4x^g) or Jbs 



log4(14) ■ ~ 0.24A'^ with additional symmetric helix restric- 
tion, as above. Hence, the possibility of wobble pairs tends to 
increase sequence entropy J, and to concomitantly decrease the 
number of sequence constraints /. On the other hand, including 
a significant fraction of wobble pairs (e.g. a third) in designed 
structures tends also to facilitate the formation of unwanted, alter- 
native low energy structures (e.g. with fewer wobble pairs). Thus, 
including wobble and WC pairs on an equal footing effectively un- 
derestimates coding requirements, since preventing the formation 
of unwanted alternative structures then requires additional infor- 
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mation constraints, especially for longer sequences (e.g. >150 
nucleotides). In practice, limiting the fraction of wobble pairs 
in designed structures can efficiently prevent the formation of al- 
ternative structures with limited additional sequence constraint. 
(e.g. we have used 3 wobble pairs out of a total of 50 base pairs 
in structures I and 2). This also justifies a posteriori the initial 
crude estimate we have made based on WC pairs only. Ignor- 
ing opposite effects of wobble base pairs and unwanted alternative 
structures is a reasonable first approximation of global informa- 
tion constraint requirement. More precised estimates are difficult 
to obtain, in general, and sequence candidates should always be 
tested for possible alternative structures with an appropriate RNA 
folding algorithm including base pair stacking free energies (we 
have used kinefold] 46] which also includes pseudoknots and knots 
in RNA structures). Alternatively, it is usually possible, owing to 
the available sequence entropy, to implement highly constraining 
heuristics that prevent the formation of alternative structures. Typ- 
ical heuristics are based on the limitation of short complementary 
substring occurrences in the sequence|51, 52, 53]. 

The previous coding requirement estimates demonstrate that 
structural information / and symmetry constraints Is encoded on 
an RNA sequence can equivalently limit its available entropy J. 
In particular, combinations of structure and symmetry constraints 
can provide tighter upperbounds to possible coding increments I' 
of any new feature encoded on the sequence, via the conservation 
law / + + J = iV, i.e., I' < J = N - I - Is. 

This can be applied to estimate the minimum information that 
might be required to obtain two efficient opposite folding paths 
from the generic bistable RNA switch sequence of Fig. lA. From 
the previous estimate, we conclude that the two efficient folding 
paths for the direct and reverse sequences do not require to con- 
strain more than about I/8th of the 42 bases that are paired in 
both low energy structures (i.e. 'overlapping base pairs'). It cor- 
responds to assigning, for each folding path, a maximum of 2 or 3 
overlapping base pairs not already constrained by the combination 
of branched and rod-like low energy structures. 

This limited coding requirement concerning overlapping base 
pairs reinforces, a posteriori, our intuitive design principles fo- 
cussing, instead, on a few bases paired in only one of the two low 
energy structures (i.e. 'non-overlapping base pairs'). Figs. 1&2. 
Thus, efficient co-transcriptional folding of the direct switch 
(Fig. 2) primarily relies on the sole terminal GC base pair at the 
base of Pa to prevent the nucleation of Pc, while the exchange be- 
tween Pb and Pc for the reverse switch hinges on the AC terminal 
mismatch at the base of Pb to facilitate the nucleation of Pc. 

Hence, if all non-functional sequence symmetries are lifted, we 
expect that selective folding paths can indeed be readily achieved 
for a wide class of RNA sequences, as they require little encoded 
information beyond small asymmetries between alternative helices 
to guide or prevent their successive exchanges during transcrip- 
tion. Interestingly, this pivotal role of a few unpaired or transiently 
paired bases for efficient folding paths is also observed for other 
encoded molecular functions of RNAs. For instance, a few un- 
paired conserved bases usually prove essential for ribozyme func- 
tions or in vitro selected aptamers showing remarkable binding 
efficiency to specific target molecules |4^ . 

Discussion 

Sequential vs encoded co-transcriptional folding 

Although many convinci ng repo rts have shown the importance 
of co-transcriptional foldingj2l|,[3llll[13,[ll[3l[3lS, the un- 
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FIG. 6: Simple sequential folding of a bistable RNA switch under se- 
quence reversal and circular permutation. 

A. Bistable generic sequence with exactly overlapping helices Pa, Pb, Pc 
and Pd. A permutation between the starting and ending regions of the wild 
type sequence can be obtained by genetically connecting its 5' and 3' ends 
and engineering two new ends from an alternative break point in the circu- 
larized sequence. B. Sequential folding path of the wild type vs circularly 
permuted sequences. C. Different branched structures obtained for wild 
type and circularly permuted sequences independently from the direction 
of transcription or sequence reversal (solid vs dashed aiTows). The alterna- 
tive rodlike structures (wild type Pd-Pc and circularly permuted Pa-Pb) are 
not formed through sequential folding, although they ai'c expected to co- 
exist with branched structures at equilibrium (not drawn). Figs. 1&2 show, 
however, that small asymmetries (2-3 bases) between overlapping helices 
are sufficient to efficiently guide RNA switches into either branched or 
rod-like structures (see text). 



derlying mechanisms behind efficient folding paths has remained 
elusive. As recalled in the introduction, this is mainly due to intrin- 
sic difficulties to probe natural RNA folding paths independently 
from their equilibrium structures and possibly multiple functions. 

In particular, studying the equilibrium folds of increasingly 
longer 3 '-truncated transcripts has been argued to miss impor- 
tant out-of-equilibrium intermediates on the folding path of full 
length molecules (5^ . This problem was, however, circumvented 
by using circularly permutated variants of wild type sequences to 
study the folding pathways of E. coli RNaseP RNAj4ll l42ll and 
Tetrahymena group I intronj43l l44ll . In this approach depicted 
with a simple RNA switch on Fig. 6, the natural 5' and 3' ends 
of the molecule are genetically connected, while two new ends 
are engineered from an alternative break point in the circularized 
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sequence. This results in a circular permutation of the various do- 
mains on the linearized RNA. The resulting permutation in their 
transcription order indirectly probes folding paths of the full length 
wild type sequence. In particular, altering the connectivity of the 
primary sequence may lead to alternative co-transcriptional folds 
as illustrated on Fig. 6. The rationale underlying such circular 
permutation scenario assumes that co-transcriptional folding pri- 
marily relies on sequential folding of the nascent chain into in- 
dependent native domains with no specific coding for transient 
base pair interactions during transcription. This suggests, in par- 
ticular, that co-transcriptional folding favors branched secondary 
structures (4J (as illustrated in Fig. 6) and that the 5' to 3' direc- 
tionality of RNA transcription plays little role as long as the differ- 
ent native domains can all successively fold during transcription. 

By contrast, our results demonstrate that folding paths can 
efficiently guide RNA transcripts into distinct alternative struc- 
tures even when competing branched-like conformations exist and 
could, in principle, form during transcription. This competi- 
tion between local overlapping helices and even global alternative 
structures is, in fact, ubiquitous to the folding dynamics and ther- 
modynamics of RNA molecules. For instance, co-transcriptional 
folding has long been known to induce structural rearrangements 
as the nascent RNA chain is being transcribedfl^ [TtIi . demon- 
strating that transient helices almost inevitably participate in co- 
transcriptional folding paths. More recently, transient helices were 
also shown to affect force-induced unfolding paths of single RNA 
molecules in micromechanical experiments 15711 (e.g. for E. coli, 
1540-nt-long 16S rRNA). 

The present study, following earlier stochastic folding simula- 
tions reported in refs. l4^l4^ l5^ l57h . demonstrates that this un- 
avoidable competition between alternative base pairs can be ex- 
ploited to precisely encoded co-transcriptional folding paths of 
RNA sequences. It is primarily achieved through branch migra- 
tion exchanges between transient and native helices forming suc- 
cessively as the transcription proceeds from 5' to 3' ends of the 
sequence. This experimental finding is, in fact, corroborated by a 
recent statistical analysis of non-coding RNA sequences by Meyer 
& Miklos issh who demonstrated the existence of a 5'-3' vs 3'- 
5' asymmetry in the relative positional correlation between native 
and non-native helices along primary sequences. 

Information content and RNA evolution 

Non-coding RNAs typically tolerate a significant number of 
neutral mutations and covariations in their sequence, which pre- 
sumably facilitates their continuous adaptation to environmental 
changes. From an information content perspective, this tolerance 
to (concerted) mutations also suggests that (partly) unconstrainted 
nucleodites may be used to encode other alternative structures and 
functions on the same RNA sequence, a feature which might have 
favored the emergence of new functional RNAs and RNA switches 
in the course of evolution jsgl l60h . In fact, much more functional, 
as well as non-functional information can be encoded on an RNA 
sequence. For instance, the pair of RNA switches we have de- 
signed (Fig. 2) demonstrates that not only alternative structures but 
also selective folding pathways and strong sequence symmetries 
can all be encoded simultaneously on the same RNA sequence. 
While such strong sequence symmetries are both non-functional 
and probably too stringent constraints to possibly emerge and 
adapt through natural or in vitro selection, they can be used to pro- 
vide quantitative information on other encoded features of interest. 
For instance, equalling stacking contributions between alternative 
folds using helix symmetries also provides a powerful differen- 



tial approach to uncover other free energy contributions from non- 
canonical tertiary structure motifs. In the present study, small but 
reproducible differences in band separation between direct and re- 
verse switches at equilibrium (Fig. 2) may possibly reflect a dif- 
ference in tilt angle between helices Pc and Pd, due to differently 
structured interior loops in their respective rod-like structures. 

In this study, we showed that co-transcriptional folding can ef- 
ficiently guide RNA folding either towards branched structures (as 
for the 'direct' switch. Fig. 2) or towards elongated rod-like struc- 
tures (as for the 'reverse' switch. Fig. 2) even though their he- 
lices are mutually identical. We also argued that only limited in- 
formation is necessary to encode such selective folding paths for 
generic RNA switches: it essentially amounts to encoding the rel- 
ative lengths of helices forming successively during transcription. 
Moreover, this strict hierarchy between successively exchanging 
helices can be somewhat alleviated by resorting to topological bar- 
riers based on 'entangled' helices (i.e. simple co-transcriptional 
knots) jl^. Hence, we expect that the present findings concern- 
ing short RNA sequences (i.e. <100 bases) may be applied to 
design efficient folding paths for a wide class of larger RNA tar- 
get structures. This could be achieved by encoding different series 
of local folding events leading to a succession of either rod-like 
or branched motifs at the 3' end of the nascent RNA molecule 
during transcription. Such folding scheme also provides a theoret- 
ical frame to analyze selective folding paths of natural non-coding 
RNAsIS^. 

Finally, these results suggest that efficient folding pathways 
might have easily emerged and continuously adapted in the course 
of evolution the same way functional native structures have done 
so through mutation drift in sequence space; non-deleterious mu- 
tations are mostly neutral and conserve sequence folds and ac- 
tivity, while new functions may occasionally arise by rare hop- 
ping between intersecting networks of neutral mutations ("neutral 
networks' ' ) f54l. Issi. l6(ll . Furthermore, the fact that encoded fold- 
ing paths may be redirected through transient antisense interac- 
tions (Figs. 4,5) provides simple 'all-RNA' mechanisms to regu- 
late the functional folding of RNAs in the absence of any elabo- 
rate control at the level of transcription initiation. From an ances- 
tral "RNA World" perspective, this constitutive coupling between 
RNA synthesis and RNA folding regulation may have also enabled 
the early emergence of autonomous RNA-based networks relying 
solely on intra- and intermolecular base pair interactions. Indeed, 
RNA molecules cross-regulating their respective encoded folding 
paths could, in principle, be combined to perform essential reg- 
ulation tasks, characteristic to all natural and engineered control 
networks (e.g. negative and positive feedback loops, feedforward 
loops, toggle switches, oscillators, etc). 
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